Linear combination of one-step predictive information with an external reward in an episodic policy gradient setting: a critical analysis

نویسندگان

Keyan Zahedi

Georg Martius

Nihat Ay

چکیده

One of the main challenges in the field of embodied artificial intelligence is the open-ended autonomous learning of complex behaviors. Our approach is to use task-independent, information-driven intrinsic motivation(s) to support task-dependent learning. The work presented here is a preliminary step in which we investigate the predictive information (the mutual information of the past and future of the sensor stream) as an intrinsic drive, ideally supporting any kind of task acquisition. Previous experiments have shown that the predictive information (PI) is a good candidate to support autonomous, open-ended learning of complex behaviors, because a maximization of the PI corresponds to an exploration of morphology- and environment-dependent behavioral regularities. The idea is that these regularities can then be exploited in order to solve any given task. Three different experiments are presented and their results lead to the conclusion that the linear combination of the one-step PI with an external reward function is not generally recommended in an episodic policy gradient setting. Only for hard tasks a great speed-up can be achieved at the cost of an asymptotic performance lost.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Information-driven intrinsic motivation in reinforcement learning

One of the main challenges in the field of embodied artificial intelligence is the open-ended autonomous learning of complex behaviours. Our approach is to use task-independent, information-driven intrinsic motivation(s) to support task-dependent learning. The work presented here is a preliminary step in which we investigate the predictive information (the mutual information of the past and fut...

متن کامل

Identifying key steps in developing a one-stop shop for health policy and system information in a limited-resource setting: A case study

Background: There is limited understanding about the development of the online one-stop shops for evidence in a limited-resource setting, such as Uganda. This study aimed to provide a comprehensive account of the development process of the online resource for local policy and systems-relevant information in this setting. Methods: We utilized a case study design to address our objective where ...

متن کامل

Power and Agenda-Setting in Tanzanian Health Policy: An Analysis of Stakeholder Perspectives

Background Global health policy is created largely through a collaborative process between development agencies and aid-recipient governments, yet it remains unclear whether governments retain ownership over the creation of policy in their own countries. An assessment of the power structure in this relationship and its influence over agenda-setting is thus the first step towards understanding w...

متن کامل

Implementing Bounded Linear Programming and Analytical Network Process Fuzzy Models to Motivate Employees: a Case Study

In this research, the factors affectinguniversity employees’ motivation and productivity are identified and classified in seven groups; the impact of each motivation factor on the productivity is presented by ANP fuzzy model.Eight universities in Iran were analyzed in this research work. The aim of this study is to explore the productivity of employees. This paper attempts to give new insights ...

متن کامل

Finite sample analysis of the GTD Policy Evaluation Algorithms in Markov Setting

In reinforcement learning (RL) , one of the key components is policy evaluation, which aims to estimate the value function (i.e., expected long-term accumulated reward) of a policy. With a good policy evaluation method, the RL algorithms will estimate the value function more accurately and find a better policy. When the state space is large or continuous Gradient-based Temporal Difference(GTD) ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 4 شماره

صفحات -

تاریخ انتشار 2013

Linear combination of one-step predictive information with an external reward in an episodic policy gradient setting: a critical analysis

نویسندگان

چکیده

منابع مشابه

Information-driven intrinsic motivation in reinforcement learning

Identifying key steps in developing a one-stop shop for health policy and system information in a limited-resource setting: A case study

Power and Agenda-Setting in Tanzanian Health Policy: An Analysis of Stakeholder Perspectives

Implementing Bounded Linear Programming and Analytical Network Process Fuzzy Models to Motivate Employees: a Case Study

Finite sample analysis of the GTD Policy Evaluation Algorithms in Markov Setting

عنوان ژورنال:

اشتراک گذاری